187 research outputs found

    Analyzing Inexact Hypergradients for Bilevel Learning

    Full text link
    Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods, and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.Comment: Accepted to IMA Journal of Applied Mathematic

    On Optimal Regularization Parameters via Bilevel Learning

    Full text link
    Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information, and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Existing strategies such as the discrepancy principle and L-curve can be used to determine a suitable parameter value, but in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters, and involves solving a nested optimisation problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still a developing field. One necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterises positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and large dimensional problems.Comment: 26 pages, 6 figure

    A temporal multiscale approach for MR Fingerprinting

    Get PDF
    Quantitative MRI (qMRI) is becoming increasingly important for research and clinical applications, however, state-of-the-art reconstruction methods for qMRI are computationally prohibitive. We propose a temporal multiscale approach to reduce computation times in qMRI. Instead of computing exact gradients of the qMRI likelihood, we propose a novel approximation relying on the temporal smoothness of the data. These gradients are then used in a coarse-to-fine (C2F) approach, for example using coordinate descent. The C2F approach was also found to improve the accuracy of solutions, compared to similar methods where no multiscaling was used.Comment: 4 pages, 3 figures. Title revise

    On the convergence and sampling of randomized primal-dual algorithms and their application to parallel MRI reconstruction

    Get PDF
    Stochastic Primal-Dual Hybrid Gradient (SPDHG) is an algorithm to efficiently solve a wide class of nonsmooth large-scale optimization problems. In this paper we contribute to its theoretical foundations and prove its almost sure convergence for convex but neither necessarily strongly convex nor smooth functionals. We also prove its convergence for any sampling. In addition, we study SPDHG for parallel Magnetic Resonance Imaging reconstruction, where data from different coils are randomly selected at each iteration. We apply SPDHG using a wide range of random sampling methods and compare its performance across a range of settings, including mini-batch size and step size parameters. We show that the sampling can significantly affect the convergence speed of SPDHG and for many cases an optimal sampling can be identified
    • …
    corecore